IDENTIFYING ACTIVITY* 



A. S. LEWIS 1 " AND S. J. WRIGHT* 

Abstract. Identification of active constraints in constrained optimization is of interest from both 
practical and theoretical viewpoints, as it holds the promise of reducing an inequality-constrained 
problem to an equality-constrained problem, in a neighborhood of a solution. We study this issue in 
the more general setting of composite nonsmooth minimization, in which the objective is a composi- 
tion of a smooth vector function c with a lower semicontinuous function h, typically nonsmooth but 
structured. In this setting, the graph of the generalized gradient dh can often be decomposed into 
a union (nondisjoint) of simpler subsets. "Identification" amounts to deciding which subsets of the 
graph are "active" in the criticality conditions at a given solution. We give conditions under which 
any convergent sequence of approximate critical points finitely identifies the activity. Prominent 
among these properties is a condition akin to the Mangasarian-Fromovitz constraint qualification, 
which ensures boundedness of the set of multiplier vectors that satisfy the optimality conditions at 
the solution. 
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1. Introduction. We study "active set" ideas for a composite optimization 
problem of the form 

min h(c(x)j . (1-1) 

Throughout this work, we make the following rather standard blanket assumption. 

Assumption 1. The function h: R m — * R is lower semicontinuous and the 
function c: R" — * R' Tl continuously differentiable. The point x € R m is critical for the 
composite function hoc, and satisfies the condition 

d°°h(c(x))nN{Vc(xy) = {0}. (1.2) 

In condition (jl.2[) . N(-) denotes the null space and 9°° denotes the horizon subdiffer- 
cntial, defined below. 

Some comments are in order. Because the outer function h can take values in 
the extended reals R = [—00, +00], we can easily model constraints. In many typical 
examples, h is convex. We develop the general case, although noting throughout how 
the theory simplifies in the convex case. For notational simplicity, we suppose that 
the inner function c is everywhere defined, the case where its domain is an open subset 
being a trivial extension. By a critical point for hoc, we mean a point satisfying the 
condition € d(hoc)(x). Here, d denotes the subdifferential of a nonsmooth function. 
We refer to the monographs [IJ [5J [S] for standard ideas from variational analysis and 
nonsmooth optimization, and in particular we follow the notation and terminology 
of [8j. For continuously differentiable functions, the subdifferential coincides with the 
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derivative, while for convex functions it coincides with the classical convex subdiffer- 
ential. Equation (|1.2|) is called a regularity (or transversality) condition: d°° denotes 
the horizon subdifferential. If the function h is lower semicontinuous, convex, and 
finite at the point c, then d°°h(c) is the normal cone (in the sense of classical convex 
analysis) to the domain of h at c. If in addition h is continuous at c, then we have 
d°°h{c) = {0}. 

A standard chain rule ensures the inclusion 

d{hoc)(x) c Wc(x)*dh(c{x)). 

We deduce that there exists a vector v E R m satisfying the conditions 

vedh(c(x)), Vc(x)*v = 0. (1.3) 

By analogy with classical nonlinear programming (as we shall see), we make the 
following definition. 

Definition 1.1. A vector v E IR m satisfying the conditions \1.S\) is called a 
multiplier vector for the critical point x. 

In seeking to solve the problem (|1.1[) . we thus look for a pair (x,v) E R™ x R m 
such that 

vedh(c(x)), Vc(x)*v = 0. (1.4) 

As we have just observed, under our assumptions, this problem is solvable. On the 
other hand, given any solution (x, v) of the system (|1.4|) . if the function h is subdiffer- 
entially regular at the point c{x) (as holds in particular if h is convex or continuously 
differentiable), then we have the inclusion 

Vc(x)*dh(c(x)) c d(ho C )(x), 

Thus S d{hoc){x) and therefore x must be a critical point of the composite function 
hoc. 

We can rewrite the criticality system (|1.4p in terms of the graph gph(dh) as 
follows: 

(c(x),v) e gph(<%), Vc(x)*v = 0. 

Solving this system is often difficult in part because the graph gph(dh) may have a 
complicated structure. Active set methods from classical nonlinear programming and 
its extensions essentially restrict attention to a suitable subset of gph(dh), thereby 
narrowing a local algorithmic search for a critical point. We therefore make the 
following definition. 

Definition 1.2. An actively sufficient set for a critical point x of the com- 
posite function h o c is a set G C gph(dh) containing a point of the form (c(x),v), 
where v is a multiplier vector for x. 

The central idea we explore in this work is how to "identify" actively sufficient sets 
from among the parts of a decomposition of the graph gph(dh) . We present conditions 
ensuring that any sufficiently accurate approximate solution of system (|1.4j) with the 
pair [a;, /i(c(:r))] sufficiently near the pair [a;, /i(c(a;))] identifies an actively sufficient 
set. 
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2. Main result. We start with a useful tool. 

Lemma 2.1. Under Assumption]^ the set of multiplier vectors for x is nonempty 
and compact. 

Proof. We have already observed the existence of a multiplier vector. Since the 
subdifferential dh(c(xj) is a closed set, the set of multipliers must also be closed. 
Assuming for contradiction that this set is unbounded, we can find a sequence {v r } 
with \v r \ — > oo and 

v r £ <9/i(c(x)), Vc(i)*5 r = 0. 

By defining w r :— v r /\v r \, we have |uv| = 1 and hence without loss of generality we 
can assume w r — > w with \w\ — 1. Clearly, since w r £ N(Vc(x)*) and the null space 
is closed, we have w £ N(Vc(x)*). On the other hand, w £ d°°h[c(x)) follows from 
the definition of the horizon subdifferential. Since w ^ we have a contradiction to 
condition (fL2")) . □ 

We are ready to present the main result. 

Theorem 2.2. Suppose Assumption]^ holds. Consider any closed set G C 
gph(dh) Then for any sufficiently small number e > 0, there exists a number 5 > 
with the following property. For any point x £ R" close to x, in the sense that 

\x — x\ < S, 

if there exists a pair (c, v) £ R m x R m close to G, in the sense that 

dist((c,u),G) < e. 

and such that the first-order conditions hold approximately, in the sense that 

v£dh(c), \c-c(x)\<5, \h(c) - h(c(x))\ < S, and \Vc(x)*v\ < 5, 

then G is an actively sufficient set for x. 

Proof. Suppose the result fails. Then G is not an actively sufficient set, and yet 
there exists a sequence of strictly positive numbers ej J. as j — ► oo such that, for 
each j = 1, 2, . . ., the following property holds: There exist sequences 

x] £ R™, c] £ R m , v] £ Oh^) , r = 1, 2, . . . , 

satisfying 

x'j — + x, Cj — * c(x), h(cj) — ► h[c(x) ) , Vc(x^)*wJ — * 0, 
as r — > oo, and yet 

dist((c>J),G) < Cj -, r = l,2,.... 

For each j, we can use the proof technique of Lemma \2. II to show that the sequence 
( w P^=i must be bounded. Thus, by taking a subsequence of the indices r, we can 
suppose that this sequence converges to some vector Vj, which must be a multiplier 
vector at x. By continuity, we deduce 

dist[(c(x),Vj), G) < Cj. 

By Lemma I2TT1 the sequence (vj)°^ 1 is bounded, so after taking a subsequence of 
the indices j, we can suppose that it converges to some multiplier vector v. Noting 
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that the set G is closed, we have by taking limits as j — > oo that (c(x),v) G G, 
contradicting the assumption that G is not an actively sufficient set. □ 

An easy corollary extends from one potential actively sufficient set to many. 

COROLLARY 2.3. Suppose Assumption^ holds. Consider any finite family Q of 
closed subsets of gph(dh). Then for any sufficiently small number e > 0, there exists 
a number S > with the following property. For any point x 6 R™ close to x, in the 
sense that 

|a;-s|<<y, (2.i) 

if there exists a pair (c, v) G R m x R m close to some set G G G, in the sense that 

dist((c,u),G) < e, (2.2) 
such that the first-order conditions hold approximately, in the sense that 

vedh(c), \c-c{x)\<8, \h(c) - h(c(x))\ < 5, and \Vc(x)*v\ < 5, (2.3) 

then G is an actively sufficient set for x. 

Proof. For each set G G G, we apply Theorem 12.21 deducing the existence of a 
number eq > such that the conclusion of the theorem holds for all numbers e in the 
interval (0, ec)- Define the strictly positive number e = min^j ec- We claim the result 
we seek holds for all e in the interval (0, e). To see this, we apply the theorem for each 
set G G G to deduce the existence of a number 5g > such that the conditions (|2.3D 
and (|2 . 1 [> . with 5 = 5q, and the condition (12. 2|) . together imply that G is a actively 
sufficient set for x. The result now follows by setting <5 = mine Sq. □ 

The following result is a simple special case, easily proved directly. 

Corollary 2.4. Under the assumptions of Corollarv \2.3\ there exists a number 
e > such that 

dist((c(x),v),G) > I (2.4) 

for all multiplier vectors v for the critical point x, and all sets G G G that are not 
actively sufficient for x. 

Proof. In Corollarv l2.31 set x = x and c = c(x). D 

We end this section with another corollary, indicating how we might use the main 
result in practice. 

COROLLARY 2.5. Suppose Assumption^ holds. Consider any finite family G of 
closed subsets ofgph(dh). Then for any sequence of points x r G R", vectors c r G R , 
subgradients v r G dh(c r ), and sets G r G G (for r — 1, 2, . . .), satisfying 

X r — > X, \c r — c(x r )\ — * 0, h(c r ) — > /l(c(x)), 

Vc(x r )*v r — > 0, dist((c r , v r ), G r ) — > 0, 

as r — > oo, the set G r is actively sufficient for x for all r sufficiently large. 

Proof. Apply Corollary 12. 3[ for any sufficiently small number e > 0. Then, for 
the number 8 > guaranteed by the corollary, equations (|2.1[) . (I2.2|) and (|2.3D hold 
for all sufficiently large r, so the result follows. □ 
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3. Subdifferential graph decomposition. To apply the ideas in the previous 
section, we typically assume the availability of a decomposition of gph(dh) (the graph 
of the subdifferential of h) into some finite union of closed, not necessarily disjoint 
sets G 1 ,G 2 ,...,G k C R m x R m . For this decomposition to be useful, the sets G i 
should be rather simple, so that the restricted system 

(c(x),v) e G\ Vc(x)*v = 0. 

is substantially easier to solve than the original criticality system. The more refined 
the decomposition, the more information we may be able to derive from the iden- 
tification process. Often we have in mind the situation where each of the sets G % 
is a polyhedron. We might, for example, assume that whenever some polyhedron is 
contained in the list (G*), so is its entire associated lattice of closed faces. 

Example 3.1 (Scalar examples). We give some simple examples in the case 
m = 1. Consider first the indicator function for R+, defined by h(c) = for c > 
and +oo for c < 0. We have 



dh(c) 



if c < 

(-oo,0] ifc = 
{0} ifofl. 



Thus an appropriate decomposition is gph(9/i) = G 1 U G 2 U G 3 , where 

G 1 = {0} x (-oo, 0], G 2 = {(0, 0)}, G 3 = [0, oo) x {0}. 

Similar examples are the absolute value function | • |, for which a decomposition is 
gph(<9 1 • |) = G 1 U G 2 U G 3 , where 

G 1 = (-oo,0] x {-1}, G 2 = {0} x [-1,1], G 3 = [0, oo) x {1} (3.1) 

(further refinable by including the two sets {0, ±1}), and the positive-part function 
pos(c) = max(c, 0), for which a decomposition is gph(<9pos) = G 4 U G 5 U G 6 , where 

G 4 = (-oo,0] x {0}, G 5 = {0} x [0,1], G 6 = [0, oo) x {1} (3.2) 

(again refinable). A last scalar example, which involves a nonconvex function h, is 
given by h(c) = 1 — e~ Q l c l for some constant a > 0. We have 

!{-ae ac } if c < 
[-a, a] if c = 
{ae- ac } ifc>0. 

An appropriate partition is gph(dh) = G 1 U G 2 U G 3 , where 

G 1 = {(c, -ae ac ) : c < 0} G 2 = {0} x [-a, a] G 3 = {(c, o;e~ QC ) : c > 0}. 



Example 3.2 (An ^i-penalty function). Consider a function h : R 2 — > R that is 
an i?i-penalty function for the constraint system c\ = 0, ci < 0, that is, 

h(c) = \c\\ + max(c2, 0). (3.3) 
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Using the notation of the previous example, we have 

dh(ci,c 2 ) = 9(| • |)(cx) x <9pos(c 2 ). 

A partition of gph(dh) into nine closed sets can be constructed by using interleaved 
Cartesian products of (|3.1|) and (13. 2[1 . 

Much interest lies in the case in which the function h is polyhedral, so that gph(<%) 
is a finite union of polyhedra. However, the latter property holds more generally for 
the "piecewise linear-quadratic" functions defined in [5]. 

Of course, we cannot decompose the graph of the subdifferential dh into a finite 
union of closed sets unless this graph is itself closed. This property may fail, even 
for quite simple functions. For example, the lower semicontinuous function h : IR ^ IR 
defined by h(c) — for c < and h(c) = 1 — c for c > has subdifferential given by 

( {0} if c < 

dh(c) = < [0,oo) if c = 
[{-1} ifc>0, 

so gph(dh) is not closed. On the other hand, the subdifferentials of lower semicontin- 
uous convex functions are closed. 

In general, for any semi-algebraic function h, the set gph(dh) is semi-algebraic. 
If this set is also closed, then it stratifies into a finite union of smooth manifolds with 
boundaries. In concrete decomposition may be reasonably straightforward. 

We end this section with two examples. 

Example 3.3. The graph of the subdifferential of the Euclidean norm on IR" 
decomposes into the union of the following two closed sets: 

{(0,v):\v\<l} and { (c, ^c) : c ^ o} U {(0, v) : |v| - l}. 

Example 3.4. Consider the maximum eigenvalue function A max on the Euclidean 
space S k of k-by-k symmetric matrices (with the inner product (X, Y) = trace(XY)). 
In this space, the following sets are closed: 

Sj! = {Y e S fc : Y has rank < r} (r = 0, 1, . . . , k) 
m S h = {X e S fe : A max (X) has multiplicity > m} (m = 1, 2, . . . , k). 

Trivially we can decompose the graph gph(9A max ) into its intersection with each of 
the sets m S fe x S^. However, we can simplify, since it is is well known (see [2], for 
example) that dXmax(X) consists of matrices of rank no more than the multiplicity 
of A max (^). Hence we can decompose the graph into the union of the sets 

G m , r = gph(dA max ) n ( m S fc x S ? fc ) (1 < r < m < k). 

To apply the theory we have developed, we need to measure the distance from any 
given pair (X,Y) in the graph to each of the sets G m , r - This is straightforward, 
as follows. A standard characterization of <9A max [2] shows that there must exist an 
orthogonal matrix U, a vector x G R fc with nonincreasing components, and a vector 
y € satisfying Y^iVi = 1 an d Vi = for all indices i > p, where p is the multiplicity 
of the largest component of x, such that the following simultaneous spectral decompo- 
sition holds: X = U T (DisLgx)U and Y — C/ T (Diag y)U. Now define a vector x E R k 
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by replacing the first m components of x by their mean. (Notice that the components 
of x are then still in nonincreasing order, and the largest component has multiplicity 
at least p.) Define a vector y G R fc by setting all but the largest r components of y 
to zero and then rescaling the resulting vector to ensure its components sum to one. 
(Notice that jji = for all indices i > p.) Finally, define matrices X = U T (Di&gx)U 
and Y = U T (Diagy)U. Then, by the same subdifferential characterization, we have 
Y G dX max (X), so in fact (X,Y) G G m>r . Hence the distance from (X,Y) to G m>r is 
at most y/\x — x\ 2 + \y — y\ 2 . In fact this easily computable estimate is exact, since 
it is well known that Y is a closest matrix to Y in the set S£ and, by (3j Example 
A. 4], X is a closest matrix to X in the set m S fe . 

4. Classical nonlinear programming. We illustrate all of our key concepts 
on the special case of classical nonlinear programming, which we state as follows: 

inf f(x) 
, ntp n, I subject to pi(x) = (i = 1,2, ...,s) 

[ ' I qj(x) < = 1,2,...,*) 

x € R", 

v 

where the functions f,Pi, qj ■ R" — > R are all continuously differentiable. We use the 
notation 

q + (x) — max(q(x), 0), q~ (x) = mm(q(x), 0), (4-1) 

where the max and min of q(x) € R* are taken componentwise. (It follows that 
q(x) = q+(x)+q-(x).) 

We can model the problem (NLP) in our composite form (jl.ip by defining a 
continuously differentiable function c : R™ — > R x R s x R* and a polyhedral function 
ft:RxR s xR f -tR through 

c(x) = (/(a;),p(x), g (x)) (i 6 R") (4.2a) 

Mu, „,«,) = { ^ 0) (« € R, , E R", W e R 4 ). (4.2b) 

Clearly for any point a; G R™, the adjoint map Vc(x)* : Rx R 8 X R*-» R" is given by 

Vc(x)*(e,x,fi) = evf(x) + J2^Pi(x) + J2^ v ^- 

i i 

The subdifferential and horizon subdifferential of h at any point (u, 0, w) G Rx R s x R_ 
are given by 

dh(u,0,w) = {1} x R s x G R+ : (n,w) = 0} 
a°°/i(u, 0, w) = {0} xR s x{jieR' f : (/*, w) = 0}. 

(Elsewhere in R x R s x R , these two sets are respectively and {0}.) 

Armed with these calculations, consider any critical point x (or in particular, any 
local minimizer for the nonlinear program). By assumption, a; is a feasible solution. 
Classically, the active set is 



J = {j: qj (x) = 0}. 
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The regularity condition (|1.2|) becomes the following assumption. 

ASSUMPTION 2 (Mangasarian-Fromovitz) . The only pair (A, n) € R s x R^ satis- 
fying fj,j — for j J and 

zs (A,/i) = (0,0). 

In this framework, what we have called a multiplier vector for the critical point 
x is just a pair (A, p,) £ R s x satisfying £Lj = for j J and 

V/(i) + J! AiVft(S) + ^ fcVfc(ic) - 0. (4.3) 

» 3 

It is evident that Lemma T2. II retrieves the classical first-order optimality conditions: 
existence of Lagrange multipliers under the Mangasarian-Fromovitz constraint quali- 
fication. 

Nonlinear programming is substantially more difficult than solving nonlinear sys- 
tems of equations, because we do not know the active set J in advance. Active set 
methods try to identify J, since, once this set is know, we can find a stationary point 
by solving the system 

V/(a:) + ^piix) + Y. >iVfc-(a:) = 

Pi {x) = (i = l,2,...,j>) 

q 3 (x) = o (i e J), 

which is a nonlinear system of n + p + | J| equations for the vector (x, A, /ij) € R™ x 
R p x R | J| . Our aim here is to formalize this process of identification. Our approach 
broadly follows that of [7] , with extensive generalization to the broader framework of 
composite minimization. 

The classical notion of active set in nonlinear programming arises from a certain 
combinatorial structure in the graph of the subdifferential dh of the outer function h: 

gph(dh) = {((u,0,w), (1, A,/i)) :w<0, fi>0, (w,n)=0}. (4.4) 

We can decompose this set into a finite union of polyhedra, as follows: 

gph(dh) = |J G\ 

JC{l,2,...,t} 

where 

G J = {(( M ,0, W ),(1,A,^)) : W <0, ^>0, Wj = (j € J), Hj = Q(jtJ)}. (4.5) 

According to our definition, G J is an actively sufficient set exactly when J C J and 
there exist vectors A € R s and jl 6 satisfying fij = for all j j£ J, and the 



stationarity condition (|4.3p . We call such an index set J sufficient at x. 

We next illustrate the main result. We use the notation (|4. 1|) below. In addition, 
for a vector q S R* and a nonnegative scalar <5, we define q s G R* as follows: 

qf = h [iqt< ~ S : (4.6) 
|0, if qi >-5. v ' 



Corollary 4.1. Consider a critical point x G R™ for the nonlinear program 
(NLP) , where the objective function and each of the constraints functions are all con- 
tinuously differentiable. Suppose the Mangasarian-Fromovitz condition (Assumption 
holds. Then for any sufficiently small number e' > 0, there exists a number 5' > 
with the following property. For any triple (x, X, /i) G R™ X R s x R*j_ satisfying 

\x-x\<5', (4.7a) 

[ij = whenever qj(x) < —6', (4-7b) 
s t 

Wf{x) + ^X i Wpi{x)+^^ j Vq j {x) <S', (4.7c) 
i=i j=i 

any index set J C {1,2, ... ,t} that satisfies 

qj(x) > -e' for all j G J, (4.8a) 
< e' for all j J, (4.8b) 

is sufficient for x. 

Proof. Applying Corollary 12.31 using the decomposition above, for any number 
e > sufficiently small, there exists a number 5 > with the following property. For 
any (x, 9, X, n, f,p, q) G R" x IR x IR S x R' x R x R s x R* such that 



< 


s. 


(4.9a) 


< 


s, 


(4.9b) 


< 


s, 


(4.9c) 


< 


s, 


(4.9d) 


G 


dh(f,p,q), 


(4.9e) 



eVf(x) + J2 AiVft(x) + J2 ^q 3 (x) 



and for any index set Jc {1,2, ... ,t} such that 

dist(((/,p,g),(0,A, M )),G J ) <e, (4.10) 

we have that there exist multipliers A G R s and /2 G R+ such that 

((f(x),p(x),q(x)),(l,X,fl)) G G J , (4.11a) 

s t 

V/(x) + ^AiVp i (S)+^/i J V^(x) = 0. (4.11b) 

4=1 j=l 

To prove our claim, we need to perform three tasks. 

(i) Identify a value of 5' and values of 9 and (f,p, q) such that (|4.9|) holds when- 
ever (x, X, fx) satisfies (|4.7[k 

(ii) Identify a value of e' such that for these choices of x, (f,p,q), and (9,X,/j,), 
the condition (|4.8[) implies that (|4.10[) is satisfied. 

(iii) Prove that the outcome of Corollary |2.31 namely (|4.11[) . implies that the index 
set J is sufficient. 



We start with (i). We choose 6' > to satisfy 6' < 6 and 5' < e/y/t, and also 
small enough that \x — x\ < 5' implies 



\p(x)-p(x)\ + VtS' + \q + (x) - q+(x)\ < S, 

\f(x)-f(x)\<6. 



(4.12a) 
(4.12b) 



Now set 6 = 1 and (f,p,q) = (f(x),0,q s '(x)). Note that by (l4~7b)l and flO}, fij = 
whenever q| (x) ^ 0, and (ij > otherwise. We thus have from (|4.4[) that (1, A, /i) € 

dh(f(x),0,q s '(x)), so that (|4T9e|) holds. Since 5' < 5 and |x-x| < 5', we have (j4~9a) 
immediately, while (|4.9dj) follows from 9 = 1 and ()4.7c|) . 
We have from p(x) = and q + (x) = that 

< |p(a:)| + ^(ar) 

< |p(aj) - p(x) | + | / (s) - q" (x) | + | q + (x) - q+ (x) \ 

< \p(x)-p(x)\ + ViS' + \q + (x) - q + (x)\ < S, 
by (|4.12aj) . so that (|4~9b)) holds. Further, 

\h(f,p,q) - h(f(x),p(x),q(x))\ 

= h(f(x), 0, / (x)) - h(f(x),p(x), q(x)) 
= \f(x)-f(x)\<5, 

by (|4.12bp . so that (|4.9c[) holds. At this point we have completed task (i). 

We now show (ii). Define e' = e/Vt and note that by one of our conditions on 5', 
we have 6' < e'. Defining vectors w, /t S R* by 



q (x) i£qj(x)<-e' 
otherwise, 



fj,j if (ij > e' 
otherwise, 



(4.13) 



then by (|4.8p and the definition of G J we have 

((J(x),0,uO,(l,A,A))eG J . 

Since §' < e', we have 



gf (a;) - u). 



0, if qi (x) < -e', 
<7 4 (x), if qi(x) G (-e', -J'), 
0, if > -5'. 



(4.14) 



Thus in (|4.10p . using the values of (f,p, q) and (9, A, /i) defined above, we have that 



dist 



< 



f(x),0,q S '(x)),(l,\,»)),G J 

((/(*), 0, / (x)) , (1, A, M )) - ((/(x), 0, w), (1, A, A)) 

q 5 '(x)-w\ 2 + \v-fi\ 2 < t(e'f=e 2 , 
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The final inequality in this expression follows from (|4.13[) and (14. 14[) together with the 
fact that we cannot have both qf (x) 7^ Wi and /Ltj 7^ fii for any index i. If qf (x) 7^ u>i, 
we have from (|4.14j) that qf (x) G (-e',-5'), thus \i { = by (I4.7bj) . thus /ti = 
by (|4.13p . thus | /y.^ — fii\ — 0. We conclude that the inequality (14. 10[) is satisfied, 
completing the proof of part (ii). 

Part (iii) of the proof is immediate from the definition of a sufficient index set, so 
the proof is complete. □ 

5. Partial smoothness. We next observe a connection between the decom- 
position ideas we have introduced and the notion of "partial smoothness" [4]. For 
simplicity, in this section we restrict to the convex case, although extensions are pos- 
sible. A lower semicontinuous convex function h: R m — > R is partly smooth at point 
c G IR m relative to a set M. containing c when M. is a manifold around c, the re- 
stricted function H\m is C 2 , and the subdifferential mapping dh is continuous at c 
when restricted to M with dh(c) having affine span a translate of the normal space 
to M at c. 

Theorem 5.1. Consider a lower semicontinuous convex function h: R m — * R, a 
point c G R m , and a vector v lying in the relative interior of the subdifferential dh(c). 
Suppose that h is partly smooth at c relative to a closed set M. C R m . Then the graph 
of the subdifferential dh is the union of the following two closed sets: 

G 1 = {(c, i>) : c e M, v€dh(c)}, G 2 = cl{(c, v) : c $ M, v G dh(c)}. 

Furthermore, the set G 2 does not contain the point (c, v). 

Proof. As is well known, since h is convex and lower semicontinuous, gph(<9/i) is 
closed: indeed we can write it as the lower level set of a lower semicontinuous function: 

gplum) = {(c,v):h(c) + h*(v)-(c,v)<0}, 

where h* denotes the Fenchel conjugate of h. Since the set G 1 is just gph(dh) fl 
(M x R m ), and since M. is closed by assumption, G 1 is a closed subset of the graph 
gph(dh). The set G 2 is closed by definition, and G 2 is also obviously a subset of 
gph(dh). Therefore, we have the decomposition gph(dh) = G 1 U G 2 . 

It remains to show (c, v) £ G 2 . If this property fails, then there is a sequence 
of points Or £ M. (r = 1, 2, . . .) approaching the points c, and a corresponding se- 
quence of subgradients v r G dh(c r ) approaching the subgradient v. Then a standard 
subdifferential continuity argument shows h(c r ) — ► h(c): to be precise, we have 

liminf h{c r ) = liminf((c r , v r ) — h*(v r )) = (c, v) — limsup/i*(iv) 

TV T 

> (c,v) — h*(v) — h(c) > lim sup h(c r ). 

r 

Now [5j Thm 6.11] implies the contradiction c r G M. for all large r. □ 

We illustrate by showing how partial smoothness leads to identification. 
COROLLARY 5.2. Suppose Assumption^ holds. Suppose that the critical point x 
has a unique multiplier vector v, and that v G ri<9/i(c(x)) . Finally, assume that h is 
convex, and partly smooth at the point c(x) relative to a closed set M. C R m . Then 
any sufficiently accurate solution of the criticality conditions near x must identify the 
set M.. More precisely, for any sequence of points x r G R™, vectors c r G IR m , and 
subgradients v r G dh(c r ) (for r = 1, 2, . . .), satisfying 

x r — » x, \c r — c(x r )\ — > 0, h(c r ) —> h(c(x)), Vc(x r )*v r — » 0, 
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as r — > oo, we must have c r G M. for all sufficiently large r. 

Proof. Consider the decomposition described in Theorem 15.11 Our assumptions 
imply that the set G 2 is not actively sufficient. We now apply Corollary [23] to deduce 
the result. □ 

6. Identifying Activity via a Proximal Subproblem. In this section we 
consider the question of whether closed sets G that are actively sufficient at a solution 
x of the composite minimization problem can be identified from a nearby point 
x by solving the following subproblem: 



Properties of local solutions of this subproblem and of a first-order algorithm based 
on it have been analyzed by the authors in [5]. In that work, we gave conditions 
guaranteeing in particular that if the function h is partly smooth relative to some 
manifold M. containing the critical point x, then the subproblem (|6.1|) "identifies" 
M: that is, nearby local minimizers must lie on M. 

The identification result from [5] requires a rather strong regularity condition at 
the critical point x. When applied to the case of classical nonlinear programming we 
described above, this condition reduces to the linear independence constraint qual- 
ification, in particular always implying uniqueness of the multiplier vector. In the 
simplest case, when, in addition, strict complementarity holds, there is a unique suffi- 
cient index set, in the terminology of Section |4j and the identification result Corollary 
15.21 applies. 

By contrast, in this section, we pursue more general identification results, needing 
only the transversality condition (|1.2[) . Certain additional assumptions on the function 
h are required, whose purpose is essentially to ensure that the solution of (|6.1[) is well 
behaved. 

We start with some technical results from [5], and then state our main result. 

Definition 6.1. A function h : R m — * IR is prox- regular at a point c £ R m 
if the value h{c) is finite and every point in R m x R sufficiently close to the point 
(c, h(c)) has a unique nearest point in the epigraph {(c, t) : t > h(c)}. 
In particular, lower semicontinuous convex functions are everywhere prox-regular, as 
are sums of continuous convex functions and C 2 functions. 

For the results that follow, we need to strengthen our underlying Assumption [1] 
as follows. 

Assumption 3. In addition to Assumption^ the function c is C 2 around the 
critical point x and the function h is prox-regular at the point c = c(x). 

The following result is a restatement of [5] Theorem 6.5]. It concerns existence of 
local solutions to (|6.ip with nice properties. 

Theorem 6.2. Suppose Assumption^ holds. Then there exist numbers p > 0, 
5 > and k > and a mapping d: B$(x) X (/2, oo) — > R™ such that the following 
properties hold. 

(a) For all points x € B$(x) and all scalars fi > p,, the point d(x,fi) is a local 
minimizer of the subproblem H6.1\) . and moreover satisfies \d(x,n)\ < k\x — x\. 

(b) Given any sequences of points x r — > x and scalars fi r > p,, if either h(c(x r )) — > 
h(c) or [j< r \x r — x\ 2 —>■ 0, then 




(6.1) 



h[c(x r ) + \/c(x r )d(x r , fx r )) — > h(c). 



(6.2) 
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(c) When h is convex and lower semicontinuous, the results of parts (a) and (b) 
hold with p — 0. 

The next result is a slightly abbreviated version of [5J Lemma 6.7]. 

Lemma 6.3. Suppose Assumption^ holds. Then for any sequences [i r > and 
x r — > x such that p r \x r — x\ — > 0, and any corresponding sequence of critical points 
d r for the subproblem 16. 1\) that satisfy the conditions 

d r — 0(\x r — x\) and h(c{x r ) + Vc(x,.)c? r ) — > h(c), (6-3) 

there exists a bounded sequence of vectors v r that satisfy 

= Vc(x r )*v r + p r d r , (6.4a) 
v r G dh(c{x r ) + \7c(x r )d r ). (6.4b) 

If we assume in addition that \i r > p, where p is defined in Theorem 16. 2[ the vectors 
d r := d(x r , n r ) satisfy the properties (|6 .3[) and hence the results of Lemma IB~3l apply. 
We now prove the main result of this section. 

Theorem 6.4. Suppose Assumption holds, and consider a closed set G C 
gph(dh). Consider any sequences of scalars )i r > and points x r — > x satisfying the 
condition I — > 0, and let d r be any corresponding sequence of critical points of 

the subproblem H6.1]) satisfying i6.3\) . Consider any corresponding sequence of vectors 
v r satisfying the conditions and also 

dist((c(x r ) + Vc(x r )d r ,v r ),G) -> 0. (6.5) 

Then G is an actively sufficient set at x. 

Proof. We apply Corollarv l2.51 with Q = {G} and c r := c{x r )+V 'c(x r )d r . Because 
of the various properties of and /j, r , from Theorem 16.21 and Lemma [6~3l we 

have the following estimates: 

\x r — x\ — > 0, 

v r G dh(c r ), 

\cr — c{x r )\ — \\7c(x r )d r \ — 0(\d r \) = 0(\x r — x\) — > 0, 

\h{Cr) - h{ C {x))\ - 0, 

|Vc(x r )*iv| = /x r |d r | = fj, r O(\x r — x\) — * 
dist((c r , v r ), G) —> 0. 

The result follows. □ 

Note again that Theorem l6.2l and Lemma [B~3l show that vectors d r satisfying the 
conditions of Theorem 16.41 can be obtained when fi r > p, and that we can take p = 
when h is convex and lower semicontinuous. 

As we have seen, in particular in the case of classical nonlinear programming, 
we typically have in mind some "natural" decomposition of the subdifferential graph 
gph(<9/i) into the union of a finite family Q of closed subsets. We then somehow 
generate sequences, p r , x r , d r , and v r of the type specified in the theorem, and 
thereby try to identify actively sufficient sets in Q, preferring smaller sets since the 
corresponding restricted criticality system is then more refined. Since Q is a finite 
family, Theorem 16.41 guarantees that we must identify at least one actively sufficient 
set in this way. However, we may not identify all actively sufficient sets G G Q in 
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this way. In other words, a sequence of iterates generated by the algorithm based on 
(|6.ip and corresponding multiplier vectors may "reveal" some of the actively sufficient 
sets but not others. We illustrate this point with an example based on a degenerate 
nonlinear optimization problem in two variables. 

Example 6.1. Consider the map c : R 2 — > R 3 defined by 



c(x) 



(xi + l) 2 +x 2 2 



and the function h : R 3 — > R defined by 



h(c) 



ci ifc 2 ,c 3 <0 
+00 otherwise. 



Minimizing the composite function hoc thus amounts to maximizing x\ over the set 
in R 2 defined by the constraints |x| < 1 and \x — (—1, 0) T | < 2. The unique minimizer 
of h o c is the point x = (l,0) r , at which c(x) = (—1,0, 0) T . The set of multiplier 
vectors is 

dh(c(x))nN(V(c(x))*) = { a (l,l,0) T + (l-Q)(l,0,i) T : a e [0,1]}. 

One decomposition of gph(dh) is as the union of the following four closed sets: 
G 1 ={(ci,C2,C3, 1,0,0) :c 2 <0, c 3 < 0} 
G 2 = {(d,0,C3, 1,«2,0) : v 2 > 0, c 3 < 0} 
G 3 = {(ci,c 2 , 0,1,0,^) : c 2 < 0, v 3 > 0} 
G 4 - {(ci,0,0,l,u 2 ,u 3 ) : v 2 > 0, v 3 > 0}. 

(We can refine further, but this suffices for our present purpose.) In this decomposi- 
tion, the actively sufficient subsets are G 2 , G 3 , G 4 . 

The subproblem (|6.ip . applied from some point x = (xi,0) T with x\ close to 1, 
reduces to 

minimize — d\ + ^(df + d 2 ,) 
1 X\ 

subject to d\ < — , 

, . 2 xi+l 

d\ < , 

xi + l 2 ' 

d e R 2 . 

If Xi = 1 — e for some small e (not necessarily positive), the constraints reduce to 

di <e + ^e 2 + 0(e 3 ), 

di <e + \f 2 + 0(e 3 ). 
4 

Providing e -c — , the solution of the subproblem has d\ w e + \e 2 and c? 2 = 0. The 
corresponding linearized values of c 2 and C3 are 

c 2 (a;) + Vc 2 {xfd w ^e 2 , c 3 (x) + Vc 3 (x) T d = 0, 
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and the corresponding multiplier vector is v ~ (1,0, \ ) T ■ Thus this iterate "reveals" 
the actively sufficient sets G 3 and G 4 , but not G 2 . 

Subsequent iterates generated by this scheme have the identical form (1 — e,0) T 
with successively smaller values of e, so the sequence satisfies the property (I6.5[) only 
for G = G 3 and G = G 4 , but not for G = G 2 . 

Consider again the nonlinear programming formulation of Section 2] In that 
framework, for a given point x G R", the proximal subproblem (|6.ip is the following 
quadratic program: 

minimize f(x) + Vf(x) T d + ^\d\ 2 (6.6a) 

subject to Pi(x) + Vpi(x) T d — (i = 1, 2, . . . , s) (6.6b) 
q 3 (x)+V qj (x) T d<0 (j = l,2,...,t) (6.6c) 
deR". (6.6d) 

We derive the following corollary as a simple application of Theorem 16.41 

Corollary 6.5. Consider the nonlinear program (NLP), where the functions f, 
Pi (i = 1, 2, . . . , s) and qj (j = 1, 2, . . . , t) are all C 2 around the critical point x, and 
suppose that the Mangasarian-Fromovitz constraint qualification, Assumption^ holds. 
Consider sequences of scalars [i r > and points x r — > x satisfying fx r \x r — x\ — > 0, 
let d r be the corresponding (unique) solution of id. 6]) , and consider an additional 
sequence of positive tolerances e r — > 0. Then for all sufficiently large r > f, the index 
set J(r) C {1, 2, . . . , t} defined by 

J{ r ) : = {j '■ 1ji x r) + Vq 3 {x r ) T d r > -6 r } (6.7) 

is sufficient for x. 

Proof. Suppose the result fails, so that by taking a subsequence, we can assume 
that J(r) is constant: J(r) — J for all r, where J is not sufficient for x. Noting con- 
vexity of the function h defined in Section 3] and the equivalence of the transversality 
condition <\1.2\i and Assumption^ we have from Theorem 16 . 41 that the unique solution 
of the subproblem (|6 . 6[) satisfies d r — 0(\x r — x\) and 

h(c(x r ) + Vc(x r )d r ) = f(x r )+X7f(x r ) T d r — > f(x) — h(c(x)). 

The distance between the point 

((f(%r) + V f(x r ) T d r ,p(x r ) + Vp(x r )d r , q(x r ) + \7q(x r )d r ), (1, A r , jU r )J , 

and the set G J defined in (14. 5p . approaches zero, where A r and /z r are the multipliers 
for the linear constraints in the subproblem (|6.6[) . We conclude from Theorem 16.41 
that G J is an actively sufficient set at x, so that the index set J is sufficient. This is 
a contradiction. □ 

Similar results hold for a nonsmooth penalty formulation of the nonlinear program 
(NLP). For example, the £i-penalty formulation corresponds to the function h defined 
as follows: 

s t 

h(u, v, w) = u + v( \ v i\ + max(u>j, 0)) . 
i=i j=i 
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The corresponding proximal subproblem (|6.1[) at some given point x € R™ is as follows: 

min f( x )+Wf(x) T d+^-\d\ 2 + 
deR™ * 

v (y2 \Pi( x ) + Vp 4 (x) T d| +^max( 9j (i) + Vqj(x) T d,0)\ 

i=i i=i 

for a given penalty parameter v > 0. A result similar to Corollary 16.51 for this 
formulation would lead to an identification result like Theorem 3.2 of [7], provided that 
v is large enough to bound the too norm of all multipliers that satisfy the stationarity 
conditions for (NLP). A notable difference, however, is that Theorem 3.2] uses 
a trust region of the form ||d||oo < A to restrict the size of the solution d, whereas 
this subproblem uses the prox term ^|d| 2 . Although the use of an trust-region 
allows the subproblem to be formulated as a linear program, the radius A must satisfy 
certain conditions, not easily verified, for the identification result to hold. By contrast, 
there are no requirements on \i in the subproblems above, beyond positivity. 

A possible extension we do not pursue here allows an extra term \ (d, Bd) for some 
monotone operator B, in addition to the prox term ^|d| 2 . This generalization allows 
SQP type subproblems to be considered, potentially useful in analyzing algorithms 
combining identification and second-order steps into a single iteration (as happens 
with traditional SQP methods). 
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